Introduction

In this vignette, we apply the Spatial Logistic Gaussian Process (SLGP) model to estimate temperature distributions at meteorological stations across Switzerland. This example illustrates SLGP’s capacity to model spatial temperature variations based on location-specific factors, such as latitude, longitude, and altitude, using real meteorological data made available by MeteoSwiss Swiss Federal Office of Meteorology and Climatology MeteoSwiss (2019). The altitude profile of Switzerland is based on data from Swiss Federal Office of Topography Swisstopo (2019).

Data Loading and Visualization

We begin by loading and visualizing the dataset of daily average temperatures at 29 meteorological stations across Switzerland. This initial step provides insight into temperature variability across different regions and altitudes.

# Dataset
X <- read.csv("./data_meteo.txt", encoding = "latin1")
stations <- unique(X[, c(1, 8:13)])

world_map <- map_data("world", region="Switzerland") #Load a map
Meteorological stations where measurements are available

Meteorological stations where measurements are available

To ensure a smooth modeling process, we prepare the dataset by renaming variables for clarity and storing their respective value ranges. This step facilitates easier handling of variables in subsequent modeling steps.

#Ready to normalise
range_temp <- c(-30, 40)
range_lat <- range(topography$lat)
range_long <- range(topography$long)
range_height <- c(0, 4810)

#Extract the relevant columns
samples <- X[, c("tre200d0", "Latitude", "Longitude", "Station.height.m..a..sea.level")]
colnames(samples) <- c("Temperature", "Latitude", "Longitude", "Altitude")
summary(samples)
##   Temperature         Latitude       Longitude        Altitude   
##  Min.   :-21.600   Min.   :45.87   Min.   :6.128   Min.   : 273  
##  1st Qu.:  1.500   1st Qu.:46.46   1st Qu.:7.330   1st Qu.: 485  
##  Median :  7.500   Median :46.81   Median :8.333   Median : 958  
##  Mean   :  7.584   Mean   :46.73   Mean   :8.246   Mean   :1109  
##  3rd Qu.: 13.800   3rd Qu.:47.02   3rd Qu.:9.175   3rd Qu.:1605  
##  Max.   : 29.600   Max.   :47.54   Max.   :9.879   Max.   :3571

We will use data from stations outside the canton of Bern for training the SLGP model, leaving the remaining stations as a test set to assess generalization. The model will predict temperature distributions based on the latitude, longitude, and altitude of each station.

Empirical distributions of temperatures at the considered stations

Empirical distributions of temperatures at the considered stations

Initial SLGP Estimation with Arbitrary Hyperparameters

To understand the model’s baseline behavior, we perform an initial SLGP estimation using arbitrary hyperparameters. This first run provides insight into the model’s ability to capture the temperature distribution’s overall structure.

A SLGP estimation with poorly chosen lengthscale (too big)

A SLGP estimation with poorly chosen lengthscale (too big)

The initial results show that while the model captures the general support of the temperature distribution, it fails to capture finer details, such as multi-modalities in certain distributions. This limitation likely arises from using a large length scale, which oversmooths small-scale variations.

To investigate further, we run the SLGP with a much smaller length scale. This adjustment helps reveal the model’s response to hyperparameter tuning and allows us to compare results across different spatial aggregation levels.

A SLGP estimation with poorly chosen lengthscale (too small)

A SLGP estimation with poorly chosen lengthscale (too small)

This time, while the model provides accurate temperature estimates at stations within the training set, it generalizes poorly to test stations. Using too small a length scale restricts the model’s spatial aggregation, limiting the reach of spatial information to a very local scope.

Visualising results at some stations both in and out of the training set

SLGP estimation for some stations both in the training set (=stations out of the Canton of Bern) and in the test set (=stations in the Canton of Bern)

SLGP estimation for some stations both in the training set (=stations out of the Canton of Bern) and in the test set (=stations in the Canton of Bern)

Visualising results at some stations in of the training set

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

SLGP estimation for some other stations in the training set (=stations out of the Canton of Bern).

Leveraging SLGP estimators for complex inference

We can use the MAP estimator to perform estimation over all of Switzerland.

df <- topography
colnames(df) <- c("Longitude", "Latitude", "Altitude")

disc_long <- 201
disc_lat <- 201
df <- df %>%
  mutate(Longitude=(Longitude-range_long[1])/diff(range_long),
         Latitude=(Latitude-range_lat[1])/diff(range_lat))%>%
  mutate(Longitude=round(Longitude*disc_long)/(disc_long),
         Latitude=round(Latitude*disc_lat)/(disc_lat))%>%
  group_by(Longitude, Latitude) %>%
  summarise(Altitude=median(Altitude), .groups="keep") %>%
  ungroup()%>%
  mutate(Longitude=Longitude*diff(range_long)+range_long[1],
         Latitude=Latitude*diff(range_lat)+range_lat[1]) %>%
  data.frame() 
df2 <- expand.grid(seq(range_temp[1], range_temp[2],, 141), seq(nrow(df)))
df2 <- cbind(df[df2$Var2, ], df2$Var1)
colnames(df2)[4] <- "Temperature"

dfSwissMAP<- data.frame()
rep <- 3*47
n <- nrow(df2)/rep
# Start time
start_time <- Sys.time()
for(i in seq(rep)){
  # Time at the start of each iteration
  iter_start_time <- Sys.time()
  cat(i, "\n")
  temp <- predictSLGP_newNode(SLGPmodel=model3MAP,
                              newNodes = df2[1:n  + (i-1)*n, ],
                              nDiscret=101)
  dfSwissMAP <- rbind(dfSwissMAP, temp)
  
  # Calculate elapsed time since the beginning in minutes
  elapsed_time <- as.numeric(difftime(Sys.time(), start_time, units = "mins"))
  iter_time <- as.numeric(difftime(Sys.time(), iter_start_time, units = "mins"))
  
  if(i%%3==0){
    cat("Saved just in case at ", i%/%3, "/47\n")
    save(dfSwissMAP, file="predictionInSwitzerlandMAP.RData")
  }
  # Print elapsed time and iteration time
  cat("Total elapsed time:", elapsed_time, 
      "mins\nTime for this iteration:", iter_time, "mins\n\n")
}
save(dfSwissMAP, file="predictionInSwitzerlandMAP.RData")

Similarly, we can use the MCMC estimations over all of Switzerland.

df <- topography
colnames(df) <- c("Longitude", "Latitude", "Altitude")

disc_long <- 101
disc_lat <- 101
df <- df %>%
  mutate(Longitude=(Longitude-range_long[1])/diff(range_long),
         Latitude=(Latitude-range_lat[1])/diff(range_lat))%>%
  mutate(Longitude=round(Longitude*disc_long)/(disc_long),
         Latitude=round(Latitude*disc_lat)/(disc_lat))%>%
  group_by(Longitude, Latitude) %>%
  summarise(Altitude=median(Altitude), .groups="keep") %>%
  ungroup()%>%
  mutate(Longitude=Longitude*diff(range_long)+range_long[1],
         Latitude=Latitude*diff(range_lat)+range_lat[1]) %>%
  data.frame() 
df2 <- expand.grid(seq(range_temp[1], range_temp[2],, 141), seq(nrow(df)))
df2 <- cbind(df[df2$Var2, ], df2$Var1)
colnames(df2)[4] <- "Temperature"

rep <- 47*3
n <- nrow(df2)/rep
# Start time
start_time <- Sys.time()
for(i in seq(rep)){
  # Time at the start of each iteration
  iter_start_time <- Sys.time()
  cat(i, "\n")
  temp <- predictSLGP_newNode(SLGPmodel=model3,
                              newNodes = df2[1:n  + (i-1)*n, ],
                              nDiscret=201)
  
  # Calculate elapsed time since the beginning in minutes
  elapsed_time <- as.numeric(difftime(Sys.time(), start_time, units = "mins"))
  iter_time <- as.numeric(difftime(Sys.time(), iter_start_time, units = "mins"))
  
  save(temp, file = paste0("./SavedData/predictionInSwitzerlandMCMC_", i, ".RData"))
  
  # Print elapsed time and iteration time
  cat("Total elapsed time:", elapsed_time, 
      "mins\nTime for this iteration:", iter_time, "mins\n\n")
  
}
gc()

… And now, let us use these!

Temperature-Based Localisation in Switzerland: A Proof of Concept.

In this proof of concept, we demonstrate an “inverse problem” approach: given a temperature observation, we use the MAP (Maximum A Posteriori) estimator of the Spatial Logistic Gaussian Process (SLGP) model to identify likely regions in Switzerland where this temperature could have been observed. This allows us to infer potential locations based on specific temperature measurements.

An inverse problem solved with SLGP modelling: temeprature based localisation

An inverse problem solved with SLGP modelling: temeprature based localisation

These results are consistent with Switzerland’s diverse geography:

This approach offers a sight into the potential of SLGP for solving stochastic inverse problems.

Probabilistic prediction of quantities of interest

We further leverage the probabilistic predictions of our SLGP model over Switzerland. We will visualize the joint quantile estimates along with their associated uncertainties provided by our model. However,directly visualising and aggregating these quantiles and uncertainties in 2D can be challenging. To provide clearer and more intuitive visualisations, we focus on an arbitrary slice crossing Switzerland. This slice allows us to explore the model’s predictions and uncertainties in a more interpretable way.

We begin by loading the predictions of our SLGP model over Switzerland and visualise the slices considered on a map of the country.

# Recombine chunks
data_chunks <- list()

for (i in 1:141) {  # Replace with actual number of chunks
  load(paste0("./SavedData/predictionInSwitzerlandMCMC_", i, ".RData"))
  data_chunks[[i]] <- temp
}

# Recombine all chunks into a single object if it's a data frame or list
dfSwissMCMC <- do.call("rbind", data_chunks)  # Adjust based on data structure
rm(data_chunks)
We select a slice for our visualisation. Here, we decided to take one slice that would cross through 4 stations.
A slice across Switzerland

A slice across Switzerland

We project on this line and compute our quantities of interest (the quantiles).

# Extract the chunks on the slice and project them in 1D
df_plot_meteo3 <- subset(df_plot_meteo2, df_plot_meteo2$subset1)
df_plot_meteo3$x2 <- (df_plot_meteo3$Latitude - range_lat[1])/diff(range_lat)
df_plot_meteo3$x1 <- (df_plot_meteo3$Longitude - range_long[1])/diff(range_long)
# line y=-1.05 x+ 0.85
library(StereoMorph)
orth_vect <- orthogonalProjectionToLine(df_plot_meteo3[, c("x1", "x2")], l1 = c(0, intercept1), 
                                        l2 = c(1, -slope1+intercept1))
df_plot_meteo3$x1proj <- orth_vect[, 1]
df_plot_meteo3$x2proj <- orth_vect[, 2]

dir1 <- c(1, -slope1)
dir1 <- dir1 / sqrt(sum(dir1^2))
df_plot_meteo3$place_on_line <- apply(df_plot_meteo3[, c("x1proj", "x2proj")], 
                                      1, function(u) {
                                        return(sum((u - c(0, intercept1)) * dir1))
                                      })
r1 <- range(df_plot_meteo3$place_on_line)
df_plot_meteo3$place_on_line <- (df_plot_meteo3$place_on_line-r1[1])/diff(r1)

df_plot_meteo3 <-  df_plot_meteo3%>%
  dplyr::select(-x1proj, -x2proj, -x1, -x2, -subset1, -subset2)%>%
  pivot_longer(-c("Latitude", "Longitude", "Altitude", "Temperature", "place_on_line", "Station"))%>%
  mutate(simu=substr(name, 5, 11),
         valuepdf=value) %>%
  group_by(Latitude, Longitude, Altitude, simu)%>%
  arrange(Temperature) %>%   # Sort by ascending Temperature within groups
  mutate(valuecdf=cumsum(value))%>%
  mutate(valuecdf=valuecdf-min(valuecdf))%>%
  mutate(valuecdf=valuecdf/max(valuecdf))%>%
  ungroup()%>%
  dplyr::select(-c("name", "value"))%>%
  data.frame()

df_plot_meteo_summary_1 <- df_plot_meteo3 %>%
  group_by(place_on_line, simu)%>%
  summarise(Altitude=mean(Altitude),
            mean = mean(Temperature*valuepdf),
            Station = ifelse(all(is.na(Station)), NA, unique(na.omit(Station))),
            q10 = (Temperature[which.min(abs(valuecdf-0.1))])[1],
            q20 = (Temperature[which.min(abs(valuecdf-0.2))])[1],
            q30 = (Temperature[which.min(abs(valuecdf-0.3))])[1],
            q40 = (Temperature[which.min(abs(valuecdf-0.4))])[1],
            q50 = (Temperature[which.min(abs(valuecdf-0.5))])[1],
            q60 = (Temperature[which.min(abs(valuecdf-0.6))])[1],
            q70 = (Temperature[which.min(abs(valuecdf-0.7))])[1],
            q80 = (Temperature[which.min(abs(valuecdf-0.8))])[1],
            q90 = (Temperature[which.min(abs(valuecdf-0.9))])[1],
            .groups="keep")%>%
  ungroup()%>%
  data.frame()


stations_other <- subset(df_plot_meteo_summary_1,
                         !is.na(df_plot_meteo_summary_1$Station))
stations_other <- subset(stations_other,
                         stations_other$simu==1)

We visualise the topographic profile of the slice with the location of the stations alongside it, as well as our probabilistic joint prediction of the Temperature quantiles along this slice.

A slice across Switzerland: Location on a map, Altitude Profile, Simultaneous quantile prediction and associated uncertainty

A slice across Switzerland: Location on a map, Altitude Profile, Simultaneous quantile prediction and associated uncertainty

It clearly appears that SLGP modelling captures a strong relationship between the altitude and the temperature values. In practice, the Federal Office of Meteorology and Climatology states that states that: “With every 100 metres, the temperature drops by an average of 0.65°C. Where the air is very dry, such as in an area of high pressure, the air can cool by almost 1°C per 100 metres. This process depends on the air pressure, heat radiation and water-vapour content of the air.”

Therefore, we propose representing the bottom panel of the previous figure, to visually assess whether removing this trend yields mostly flat quantiles.

It is indeed a strong improvement, which suggests that specific pre-processing of the data could improve the statistical inference. However, this is not the core of this contribution, as we focus on SLGPs as general models.

Instead, we display yet another by-product of SLGP-based density field estimations and visualise directly some quantities of interest (such as mean expected temperature, mean quantile) and the associated point-wise uncertainty (standard deviation, inter-quartile range) over the map of Switzerland.

df_plot_meteo2 <- df_plot_meteo2 %>%
  dplyr::select(-subset1, -subset2, -Station)%>%
  group_by(Latitude, Longitude, Altitude)%>%
  arrange(Temperature)%>%
  mutate(ID = cur_group_id()) %>% 
  ungroup()%>%
  arrange(ID, Temperature)
t <- as.numeric(df_plot_meteo2$Temperature[1:141])
value_cdf <- sapply(seq(1000), function(i){
  print(i)
  unlist(tapply(as.matrix(df_plot_meteo2[, 4+i]), INDEX=df_plot_meteo2$ID, function(pdf){
    cdf <-  cumsum(pdf)
    r <- range(cdf)
    cdf <- ((cdf-r[1])/(r[2]-r[1]))
    # invcdf <- approx(x=cdf, y=t, xout=c(0.1, 0.5, 0.9))
    return(cdf)
  }))
})
save(value_cdf, file="valuesCDFOverSwitzerland.RData")

qtyOfInt1 <- sapply(seq(1000), function(i){
  print(i)
  unlist(tapply(as.matrix(df_plot_meteo2[, 4+i]), INDEX=df_plot_meteo2$ID, function(pdf){
    m <- sum(t*pdf)/sum(pdf)
    return(m)
  }))
})
qtyOfInt2 <- sapply(seq(1000), function(i){
  print(i)
  unlist(tapply(value_cdf[, i], INDEX=df_plot_meteo2$ID, function(cdf){
    invcdf <- approx(x=cdf, y=t, xout=c(0.1, 0.5, 0.9))$y
    return(invcdf)
  }))
})
namesVec <- c(rep("Expected value", nrow(qtyOfInt1)),
              rep(c("Quantile: 10%", "Quantile: 50%", "Quantile: 90%"), nrow(qtyOfInt1)))
IDVec <- c(seq(nrow(qtyOfInt1)), sapply(seq(nrow(qtyOfInt1)), function(x){rep(x, 3)}))
df_plot_meteo2 <- df_plot_meteo2 %>%
  dplyr::select(Latitude, Longitude, Altitude, ID)%>%
  group_by(Latitude, Longitude, Altitude, ID)%>%
  unique()%>%
  arrange(ID)
colnames(qtyOfInt1) <- colnames(qtyOfInt2) <- seq(1000)
df_plot_meteo2 <- cbind(df_plot_meteo2[IDVec, ], namesVec, rbind(qtyOfInt1, qtyOfInt2))
colnames(df_plot_meteo2)[5] <- c("Quantity")
save(df_plot_meteo2, file="valuesQtyOverSwitzerland.RData")
rm(qtyOfInt1, qtyOfInt2)
gc()
load(file="valuesQtyOverSwitzerland.RData")
df_plot_meteo2 <-  df_plot_meteo2 %>%
  pivot_longer(-c("Latitude", "Longitude", "Altitude", "ID", "Quantity"))%>%
  group_by(Latitude, Longitude, Altitude, ID, Quantity) %>%
  summarise(meanValue = mean(value),
            medianValue = median(value),
            sd = sqrt(var(value)),
            iq = diff(quantile(value, probs=c(0.25, 0.75))),
            .groups="keep")%>%
  data.frame()
We now represent the corresponding maps.
Summary of the SLGP probabilistic predictions for the expected temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the expected temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the 10% quantile of temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the 10% quantile of temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the 50% quantile of temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the 50% quantile of temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the 90% quantile of temperatures in Switzerland.

Summary of the SLGP probabilistic predictions for the 90% quantile of temperatures in Switzerland.


References

Swiss Federal Office of Meteorology and Climatology MeteoSwiss. 2019. “Climatological Network - Daily Values.” https://opendata.swiss/en/dataset/klimamessnetz-tageswerte.
Swiss Federal Office of Topography Swisstopo. 2019. “The Digital Height Model of Switzerland with a 200m Grid.” https://opendata.swiss/en/dataset/das-digitale-hohenmodell-der-schweiz-mit-einer-maschenweite-von-200-m.